-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[Gluon] [Fix] Fix HybridBlock when hybridize is not called #16465
Conversation
.format([type(ele) for ele in flatten_args])) | ||
is_ndarray = True | ||
exist_sym_nd = True | ||
ctx = ele.context |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line introduced in #16280 is not compatible with the previous context handling. Previously, always x.context
is used as default context. https://github.com/apache/incubator-mxnet/pull/16280/files#diff-29da832c2145752f3906a2f71c7b63baL982
Does it matter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is chosen like this because x can be None now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then it should be set to the first non-None
argument, not the last?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, all ctxs are supposed to be the same. For example, we should not allow the mixing of cpu and gpu contexts. However, we currently allow to do so because we will need to mix cpu
, cpu_pinned
, and cpu_shared
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thus we should use the first non-None
argument not to break backwards compatibility? cpu, cpu_pinned, cpu_shared
are different contexts after all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@leezu I think using the first or last non-None argument does not matter much here. Our goal is to make sure that we will finally pick a meaningful context for the parameters. In fact, the previous implementation has not checked whether the contexts of the arguments are valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the previous implementation hasn't enforced all contexts being equal, we shouldn't start picking a different array to determine the context. As you stated above, it's valid to use a mix of cpu, cpu_pinned, cpu_shared
contexts.
For example, after your change, cpu_pinned
or cpu_shared
may be picked as default context instead of cpu
if the user passed a cpu_pinned
or cpu_shared
as last argument. The extra overhead could cause a performance regression (all parameters will be made available under default context).
No need to risk this given there is no advantage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@leezu It's also possible that, previously cpu_pinned is picked as the default argument and after the change, the correct cpu context is picked as the default. My point is we need to probably give special treatment of the cpu, cpu_pinned, cpu_shared
. What's your opinion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@leezu I agree that the backward-compatible issue is valid. Let me first make it to be backward-compatible. However, this does not fix the issue of the cpu, cpu_pinned, cpu_shared combination.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should get rid of choosing one array and using it's context as default context. For parameters, users should get the array via self.weight.data(ctx)
. For the time being I suggest not to break the behaviour, to avoid unintended consequences
python/mxnet/gluon/block.py
Outdated
raise ValueError('In HybridBlock, there must be one NDArray or one Symbol in the input.' | ||
' Please check the type of the args.\n') | ||
if has_ndarray: | ||
ctx = next(iter(ctx_set)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set
is unordered, thus next(iter(ctx_set))
may differ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch, should sort it beforehand.
fix bug fix fix fix lint fix backward-compatible for inferencing the ctx fix lint try to improve try to fix Update block.py Revert "Update block.py" This reverts commit bbcf41f. Revert "try to fix" This reverts commit 7d7f35c. Revert "try to improve" This reverts commit f510132. Revert "Revert "try to improve"" This reverts commit 872a7abe34c9afa97eb631fab8c3ce8558a22af8. Revert "Revert "try to fix"" This reverts commit 48e235dd4c9ee6a88d1a2515a8a1f3e57319c217. Revert "Revert "Update block.py"" This reverts commit e0d3949245050a4c60c4834db73c764d87fde6f7. fix fix lint
dcc7a3b
to
feb7d55
Compare
Description
When
.hybridize()
is not called, the forward function of HybridBlock supports a broader range of inputs. For example, the combination ofndarray
and scalar values. The following code will run smoothly without error:However, when hybridized, the second line will trigger an error:
#16280 tries to make sure that in both cases, HybridBlock will raise an error. However, it will make the code backward-incompatible. This PR fixes this backward-incompatible problem and adds new tests for such behavior.
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes